Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 11, 2025

Fix for GeneratedRegex fixer preserving multiline patterns

This PR addresses the issue where the GeneratedRegex code fixer converts multiline verbatim string literals into single-line strings with escape sequences, losing readability.

Problem

When a regex pattern contains actual newlines (from verbatim string literals or string concatenation), the fixer converts them to escape sequences like \r\n, making patterns with RegexOptions.IgnorePatternWhitespace unreadable.

Solution

  • Analyze the codebase and understand the issue
  • Modify UpgradeToGeneratedRegexCodeFixer.cs to detect newlines in pattern strings
  • Update logic to preserve verbatim string format when pattern contains newlines
  • Add comprehensive tests for various scenarios:
    • Verbatim string with actual newlines
    • String concatenation with newlines
  • Build and test the changes
  • Verify all existing tests still pass
  • Verify no build warnings or errors
  • Address review feedback: Use IndexOfAny for better performance
  • Fix tests to match current property-based syntax

Changes Made

  1. Added ShouldUseVerbatimString helper method that checks for backslashes, newlines (\n), or carriage returns (\r) using IndexOfAny for optimal performance
  2. Updated pattern handling in GetNode method to use verbatim string syntax when any of these characters are present
  3. Added two new tests to validate the fix works for both direct verbatim strings and string concatenation:
    • MultilineVerbatimStringPreservedByFixer
    • MultilineStringConcatenationPreservedByFixer
  4. Updated tests to use property syntax ({ get; }) instead of method syntax to match current codebase

Test Results

✅ All analyzer tests pass (116 tests)
✅ New tests specifically validate multiline pattern preservation
✅ No build warnings or errors
✅ Works for both direct verbatim strings and string concatenation scenarios

Impact

This fix greatly improves readability for regex patterns with RegexOptions.IgnorePatternWhitespace, particularly for complex patterns in projects like MSBuild that use string fragment composition. The fix preserves the original formatting intent from the source code.

Original prompt

This section details on the original issue you should resolve

<issue_title>GeneratedRegex fixer forces pattern onto a single line</issue_title>
<issue_description>
Consider

   private static Regex r = new Regex(@"a
             b
             c", RegexOptions.IgnorePatternWhitespace);

run the fixer, now I have

    [GeneratedRegex("a\r\n             b\r\n             c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

The semantics are the same, but the readability is gone. I would expect

    [GeneratedRegex(@"a
         b
         c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.</issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details
Consider
 ```c#
   private static Regex r = new Regex(@"a
             b
             c", RegexOptions.IgnorePatternWhitespace);

run the fixer, now I have

    [GeneratedRegex("a\r\n             b\r\n             c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

The semantics are the same, but the readability is gone. I would expect

    [GeneratedRegex(@"a
         b
         c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.

Author: danmoseley
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -
@danmoseley This comes up eg., in this MSBuild example ```c# const string itemMetadataSpecification = @"%\(\s*; (?(?[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?[A-Za-z_][A-Za-z_0-9\-]*@ \s*\)"; private Regex s_itemMetadataPattern = new(itemMetadataSpecification, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture); ``` nice and pretty, I run the fixer and it produces this ugly thing ```c# [GeneratedRegex("%\\(\\s*;\r\n (?(?[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?\r\n (?[A-Za-z_][A-Za-z_0-9\\-]*@\r\n \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` as an aside, by running the fixer I'm implicitly okaying it inlining the compound strings, which I think is fine (and inevitable) @danmoseley Built latest bits and found it's already fixed by #78172 @danmoseley Actually, it's only partially fixed. Consider ```c# private const string foo = "bar"; private static Regex r1 = new Regex(@"a " + foo + @" b c", RegexOptions.IgnorePatternWhitespace);
private static Regex r2 = new Regex(@"a        bar
                                        b
                                        c", RegexOptions.IgnorePatternWhitespace);
Both should produce identical results. However, in the first case, I lose the visible whitespace --
```c#
    [GeneratedRegex("a        bar\r\n                                            b\r\n                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();
   
   [GeneratedRegex(@"a        bar
                                            b
                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex1();

Unfortunately many of the dotnet/msbuild regexes are built up by compounding reused string fragments, which in some cases are compounded other ones, and use IgnorePatternWhitespace. This means after running the generator, they need to be fixed by hand.</comment_new>

Fixes #79891


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix GeneratedRegex fixer to preserve pattern formatting Fix GeneratedRegex fixer to preserve multiline verbatim string patterns Oct 11, 2025
Copilot AI requested a review from stephentoub October 11, 2025 16:43
Copilot finished work on behalf of stephentoub October 11, 2025 16:43
@stephentoub stephentoub marked this pull request as ready for review October 11, 2025 22:50
@Copilot Copilot AI review requested due to automatic review settings October 11, 2025 22:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the GeneratedRegex code fixer to preserve multiline verbatim string patterns instead of converting them to single-line strings with escape sequences. The fix addresses readability issues for regex patterns that use RegexOptions.IgnorePatternWhitespace, particularly in complex scenarios with string concatenation.

  • Added logic to detect patterns containing newlines or carriage returns in addition to backslashes
  • Enhanced pattern handling to preserve verbatim string syntax for better readability
  • Added comprehensive test coverage for both direct multiline patterns and string concatenation scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/gen/UpgradeToGeneratedRegexCodeFixer.cs Modified the code fixer to detect newlines/carriage returns and preserve verbatim string syntax
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs Added two comprehensive tests to verify multiline verbatim string preservation

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy enough, LGTM!

@stephentoub
Copy link
Member

@copilot, please fix the test failures:

  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.RegularExpressions.Tests (found 361 of 389 test cases)
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on [6 threads], stop on fail = off)
    System.Text.RegularExpressions.Tests.UpgradeToGeneratedRegexAnalyzerTests.MultilineStringConcatenationPreservedByFixer [FAIL]
      System.InvalidOperationException : Context: Iterative code fix application
      content of '/0/Test0.cs' did not match. Diff shown with expected as baseline:
       using System.Text.RegularExpressions;
       
       static partial class Class
       {
           private const string foo = "bar";
      -    private static Regex r1 = MyRegex();
       
           [GeneratedRegex(@"a        bar
                                               b
                                               c", RegexOptions.IgnorePatternWhitespace)]
      -    private static partial Regex MyRegex();
      +    private static partial Regex r1 { get; }
       }
      
      Stack Trace:
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/DefaultVerifier.cs(105,0): at Microsoft.CodeAnalysis.Testing.DefaultVerifier.Fail(String message)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/Extensions/IVerifierExtensions.cs(67,0): at Microsoft.CodeAnalysis.Testing.IVerifierExtensions.EqualOrDiff(IVerifier verifier, String expected, String actual, String message)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(537,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyProjectAsync(ProjectState newState, Project project, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(515,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyFixAsync(String language, ImmutableArray`1 analyzers, ImmutableArray`1 codeFixProviders, SolutionState oldState, SolutionState newState, Int32 numberOfIterations, Func`10 getFixedProject, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(477,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyFixAsync(SolutionState testState, SolutionState fixedState, SolutionState batchFixedState, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(316,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.RunImplAsync(CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/AnalyzerTest`1.cs(172,0): at Microsoft.CodeAnalysis.Testing.AnalyzerTest`1.RunAsync(CancellationToken cancellationToken)
        /_/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/CSharpCodeFixVerifier`2.cs(32,0): at System.Text.RegularExpressions.Tests.CSharpCodeFixVerifier`2.VerifyCodeFixAsync(String source, String fixedSource, ReferenceAssemblies references)
        /_/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs(1403,0): at System.Text.RegularExpressions.Tests.UpgradeToGeneratedRegexAnalyzerTests.MultilineStringConcatenationPreservedByFixer()
        --- End of stack trace from previous location ---
    System.Text.RegularExpressions.Tests.UpgradeToGeneratedRegexAnalyzerTests.MultilineVerbatimStringPreservedByFixer [FAIL]
      System.InvalidOperationException : Context: Iterative code fix application
      content of '/0/Test0.cs' did not match. Diff shown with expected as baseline:
       using System.Text.RegularExpressions;
       
       static partial class Class
       {
      -    private static Regex r = MyRegex();
      -
           [GeneratedRegex(@"a
                    b
                    c", RegexOptions.IgnorePatternWhitespace)]
      -    private static partial Regex MyRegex();
      +    private static partial Regex r { get; }
       }
      
      Stack Trace:
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/DefaultVerifier.cs(105,0): at Microsoft.CodeAnalysis.Testing.DefaultVerifier.Fail(String message)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/Extensions/IVerifierExtensions.cs(67,0): at Microsoft.CodeAnalysis.Testing.IVerifierExtensions.EqualOrDiff(IVerifier verifier, String expected, String actual, String message)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(537,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyProjectAsync(ProjectState newState, Project project, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(515,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyFixAsync(String language, ImmutableArray`1 analyzers, ImmutableArray`1 codeFixProviders, SolutionState oldState, SolutionState newState, Int32 numberOfIterations, Func`10 getFixedProject, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(477,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.VerifyFixAsync(SolutionState testState, SolutionState fixedState, SolutionState batchFixedState, IVerifier verifier, CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.CodeFix.Testing/CodeFixTest`1.cs(316,0): at Microsoft.CodeAnalysis.Testing.CodeFixTest`1.RunImplAsync(CancellationToken cancellationToken)
        /_/src/Microsoft.CodeAnalysis.Testing/Microsoft.CodeAnalysis.Analyzer.Testing/AnalyzerTest`1.cs(172,0): at Microsoft.CodeAnalysis.Testing.AnalyzerTest`1.RunAsync(CancellationToken cancellationToken)
        /_/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/CSharpCodeFixVerifier`2.cs(32,0): at System.Text.RegularExpressions.Tests.CSharpCodeFixVerifier`2.VerifyCodeFixAsync(String source, String fixedSource, ReferenceAssemblies references)
        /_/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs(1370,0): at System.Text.RegularExpressions.Tests.UpgradeToGeneratedRegexAnalyzerTests.MultilineVerbatimStringPreservedByFixer()
        --- End of stack trace from previous location ---
  Finished:    System.Text.RegularExpressions.Tests
=== TEST EXECUTION SUMMARY ===
   System.Text.RegularExpressions.Tests  Total: 29323, Errors: 0, Failed: 2, Skipped: 0, Time: 56.871s

Copy link
Contributor Author

Copilot AI commented Oct 23, 2025

@copilot, please fix the test failures:

  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)

...

Fixed in commit 1ce8e01. The tests have been updated to use the property syntax ({ get; }) instead of method syntax to match the current codebase. All 116 analyzer tests now pass.

Copilot finished work on behalf of stephentoub October 23, 2025 18:23
@stephentoub stephentoub merged commit e317e1e into main Oct 25, 2025
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeneratedRegex fixer forces pattern onto a single line

3 participants